Goto

Collaborating Authors

 black and white photo



Measuring similarity between embedding spaces using induced neighborhood graphs

arXiv.org Artificial Intelligence

Deep Learning techniques have excelled at generating embedding spaces that capture semantic similarities between items. Often these representations are paired, enabling experiments with analogies (pairs within the same domain) and cross-modality (pairs across domains). These experiments are based on specific assumptions about the geometry of embedding spaces, which allow finding paired items by extrapolating the positional relationships between embedding pairs in the training dataset, allowing for tasks such as finding new analogies, and multimodal zero-shot classification. In this work, we propose a metric to evaluate the similarity between paired item representations. Our proposal is built from the structural similarity between the nearest-neighbors induced graphs of each representation, and can be configured to compare spaces based on different distance metrics and on different neighborhood sizes. We demonstrate that our proposal can be used to identify similar structures at different scales, which is hard to achieve with kernel methods such as Centered Kernel Alignment (CKA). We further illustrate our method with two case studies: an analogy task using GloVe embeddings, and zero-shot classification in the CIFAR-100 dataset using CLIP embeddings. Our results show that accuracy in both analogy and zero-shot classification tasks correlates with the embedding similarity. These findings can help explain performance differences in these tasks, and may lead to improved design of paired-embedding models in the future.


Bounding and Filling: A Fast and Flexible Framework for Image Captioning

arXiv.org Artificial Intelligence

Most image captioning models following an autoregressive manner suffer from significant inference latency. Several models adopted a non-autoregressive manner to speed up the process. However, the vanilla non-autoregressive manner results in subpar performance, since it generates all words simultaneously, which fails to capture the relationships between words in a description. The semi-autoregressive manner employs a partially parallel method to preserve performance, but it sacrifices inference speed. In this paper, we introduce a fast and flexible framework for image captioning called BoFiCap based on bounding and filling techniques. The BoFiCap model leverages the inherent characteristics of image captioning tasks to pre-define bounding boxes for image regions and their relationships. Subsequently, the BoFiCap model fills corresponding words in each box using two-generation manners. Leveraging the box hints, our filling process allows each word to better perceive other words. Additionally, our model offers flexible image description generation: 1) by employing different generation manners based on speed or performance requirements, 2) producing varied sentences based on user-specified boxes. Experimental evaluations on the MS-COCO benchmark dataset demonstrate that our framework in a non-autoregressive manner achieves the state-of-the-art on task-specific metric CIDEr (125.6) while speeding up 9.22x than the baseline model with an autoregressive manner; in a semi-autoregressive manner, our method reaches 128.4 on CIDEr while a 3.69x speedup. Our code and data is available at https://github.com/ChangxinWang/BoFiCap.


A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models

arXiv.org Artificial Intelligence

Contrastively trained text-image models have the remarkable ability to perform zero-shot classification, that is, classifying previously unseen images into categories that the model has never been explicitly trained to identify. However, these zero-shot classifiers need prompt engineering to achieve high accuracy. Prompt engineering typically requires hand-crafting a set of prompts for individual downstream tasks. In this work, we aim to automate this prompt engineering and improve zero-shot accuracy through prompt ensembling. In particular, we ask "Given a large pool of prompts, can we automatically score the prompts and ensemble those that are most suitable for a particular downstream dataset, without needing access to labeled validation data?". We demonstrate that this is possible. In doing so, we identify several pathologies in a naive prompt scoring method where the score can be easily overconfident due to biases in pre-training and test data, and we propose a novel prompt scoring method that corrects for the biases. Using our proposed scoring method to create a weighted average prompt ensemble, our method outperforms equal average ensemble, as well as hand-crafted prompts, on ImageNet, 4 of its variants, and 11 fine-grained classification benchmarks, all while being fully automatic, optimization-free, and not requiring access to labeled validation data.


Google AI recreates Gustav Klimt paintings destroyed during WWII

#artificialintelligence

Gustav Klimt created some of the world's most expensive masterpieces, but around 20% of his artworks have been lost. Among them are the so-called Faculty Paintings: Philosophy, Medicine, and Jurisprudence. The three pieces are believed to have been destroyed in a fire during World War Two. Only black and white photos of the artworks remain. The original paintings may never be seen again, but machine learning has come close to bringing them back to life.


NASA's lunar probe snaps eerie black and white image of Jupiter and two of its moons

Daily Mail - Science & tech

NASA's Lunar Reconnaissance Orbiter - focused on observing the moon in preparation for humanity heading back to the celestial satellite - has snapped an eerie black and white photo of Jupiter and two of its moons. The LRO, which launched in June 2009, snapped the image of Jupiter and its moons, Io and Europa from 390 million miles away. The spacecraft sits roughly 62 miles (100km) above the surface of the moon, which is 239,000 miles from Earth. Given the extreme distance between the moon and the gas giant and the fact that the LRO is'aging' according to a statement, the image is a feat of technological strength. NASA's Lunar Reconnaissance Orbiter has snapped a black and white photo of Jupiter and two of its moons, Io and Europa (circled in red above) 'Because the Lunar Reconnaissance Orbiter spacecraft is aging (LRO launched over 12 years ago), it now only uses its two star trackers to keep tabs on where it is pointed, rather than its inertial measurement unit, which adds complications to imaging anywhere but straight down at the lunar surface (we don't want the star trackers pointed at the Moon rather than the stars!),' Brett Denevi, deputy principal investigator for the LRO Camera, said in a statement.


AI photo tool 'simulates travelling back in time with a modern camera'

Daily Mail - Science & tech

US researchers have created a photo colourising tool that uses artificial intelligence (AI) to create eerily lifelike images of deceased historical figures.


Deep Learning based image colorization with OpenCV - CV-Tricks.com

#artificialintelligence

In India, we celebrated the festival of color "Holi" last week. We celebrate the end of the winter with a splash of color because that's what the spring will bring us in a few days. When I was young, the celebrations were sparse. It was the decade of frugal parenting. We waited for festivals so eagerly because it meant parent approved outing and fun.


Bringing black and white photos to life using Colourise.sg

#artificialintelligence

While it is impossible to replicate the exact conditions in which the original photo was taken, it is possible to add colour to the photo to help us imagine what the photographer could have seen in that instant. It is incredible -- almost magical -- how a little bit of colour can bring us that much closer to that specific moment in time. And as such, for our hackathon in January, our team decided to build a deep learning colouriser tool trained specifically for old Singaporean photos. If you have old black and white photos and would like to colourise them, you can do so here: Colourise.sg. We do not store any of the photos that you upload to our colouriser application.


The Key Differences Between Machine Learning and Artificial Intelligence

#artificialintelligence

Machine learning and artificial intelligence (known as A.I.) both sound like futuristic terms for some dystopian future where robots take over the planet. There are lots of similarities and there is much overlap between different types of computer automated learning, inference, and autonomy, and each one comes with its own set of pros and cons. Sci-fi movies aside, there are lots of important differences between deep learning, machine learning, and artificial intelligence that highlight the different ways in which they work and the different applications they're best suited for. Here's what you need to know. This is the earliest and most broad term for computers acting on their own.